接續前一天,我們透過 AWS Load Balancer Controller 作為 Ingress Controller 部署了 Application Load Balancers(ALB)但是在更新 deployment 時卻產生了 HTTP 502 的 error。
一般來說,ALB 響應 HTTP 502 [1]原因來自於 Load Balancer 嘗試建立連線時收到 target 的 TCP RST。根據測試步驟來說,我們在嘗試更新 deployment image,此動作對於 Kubernetes 可分為兩個部分:
$ kubectl -n demo-nginx describe svc,endpoints
Name: service-demo-nginx
Namespace: demo-nginx
Labels: <none>
Annotations: <none>
Selector: app=demo-nginx
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.100.247.243
IPs: 10.100.247.243
Port: <unset> 80/TCP
TargetPort: 80/TCP
Endpoints: 192.168.16.64:80,192.168.31.134:80,192.168.40.177:80 + 7 more...
Session Affinity: None
Events: <none>
Name: service-demo-nginx
Namespace: demo-nginx
Labels: <none>
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2022-10-11T15:39:57Z
Subsets:
Addresses: 192.168.16.64,192.168.31.134,192.168.40.177,192.168.41.36,192.168.45.23,192.168.48.198,192.168.64.124,192.168.66.152,192.168.67.240,192.168.91.189
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
<unset> 80 TCP
Events: <none>
透過 CloudWatch Log Insight syntax 查看 Endpoints 更新:
filter @logStream like /^kube-apiserver-audit/
| fields requestReceivedTimestamp
| sort @timestamp asc
| filter objectRef.resource == 'endpoints' AND objectRef.name == 'service-demo-nginx' AND verb == 'update'
| limit 10000
從輸出可以觀察到將近數十筆關於 Endpoints 更新 subnets IP 地址,正也是 Deployment rolling upgrade 時 Pod IP 更新。以下擷取一筆部分紀錄:
"@message": {
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
"level": "RequestResponse",
"auditID": "7db6a466-c7ff-45a2-b3e8-9d21de586c3b",
"stage": "ResponseComplete",
"requestURI": "/api/v1/namespaces/demo-nginx/endpoints/service-demo-nginx",
"verb": "update",
"user": {
"username": "system:serviceaccount:kube-system:endpoint-controller",
...
...
"requestObject": {
"kind": "Endpoints",
"apiVersion": "v1",
"metadata": {
"name": "service-demo-nginx",
"namespace": "demo-nginx",
"uid": "2468c3ba-1bb3-4740-957b-b7a0ce58f9d0",
"resourceVersion": "7959165",
"creationTimestamp": "2022-10-11T14:31:21Z",
...
...
"subsets": [
{
"addresses": [
{
"ip": "192.168.19.10",
"nodeName": "ip-192-168-18-190.eu-west-1.compute.internal",
"targetRef": {
"kind": "Pod",
"namespace": "demo-nginx",
"name": "demo-nginx-deployment-78dc5d6bdb-gdtwz",
"uid": "bf69b974-c06f-49b9-a09f-308e76aabe11",
"resourceVersion": "7958990"
}
},
{
"ip": "192.168.35.103",
"nodeName": "ip-192-168-55-245.eu-west-1.compute.internal",
"targetRef": {
"kind": "Pod",
"namespace": "demo-nginx",
"name": "demo-nginx-deployment-78dc5d6bdb-9xctc",
"uid": "eada6e94-6dc9-4623-b796-5dd720402ab7",
"resourceVersion": "7959042"
}
},
...
...
...
因此也能可以於 AWS Load Balancer Controller Pod 上過濾 Target Group id 查看到 registering/deregistering 更新對應 IP 地址。
$ kubectl -n kube-system logs aws-load-balancer-controller-b9b6478cf-ztfrd | grep "k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"
...
...
...
{"level":"info","ts":1665502791.4772518,"msg":"deRegistering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":"eu-west-1a","Id":"192.168.22.141","Port":80},{"AvailabilityZone":"eu-west-1b","Id":"192.168.88.25","Port":80}]}
{"level":"info","ts":1665502791.5222945,"msg":"deRegistered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502793.781365,"msg":"registering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":null,"Id":"192.168.45.23","Port":80}]}
{"level":"info","ts":1665502794.109826,"msg":"registered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502794.2468624,"msg":"deRegistering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":"eu-west-1c","Id":"192.168.38.205","Port":80},{"AvailabilityZone":"eu-west-1c","Id":"192.168.53.217","Port":80}]}
{"level":"info","ts":1665502794.2834873,"msg":"deRegistered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502794.2835267,"msg":"registering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":null,"Id":"192.168.40.177","Port":80}]}
{"level":"info","ts":1665502794.4290822,"msg":"registered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502794.5652397,"msg":"deRegistering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":"eu-west-1b","Id":"192.168.76.79","Port":80},{"AvailabilityZone":"eu-west-1b","Id":"192.168.90.183","Port":80}]}
{"level":"info","ts":1665502794.6063979,"msg":"deRegistered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502794.6082087,"msg":"registering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":null,"Id":"192.168.31.134","Port":80},{"AvailabilityZone":null,"Id":"192.168.67.240","Port":80}]}
{"level":"info","ts":1665502794.7579286,"msg":"registered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502795.0438414,"msg":"deRegistering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":"eu-west-1a","Id":"192.168.19.10","Port":80}]}
{"level":"info","ts":1665502795.0870168,"msg":"deRegistered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502795.0870724,"msg":"registering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":null,"Id":"192.168.66.152","Port":80}]}
{"level":"info","ts":1665502795.2049546,"msg":"registered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502796.5167701,"msg":"registering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":null,"Id":"192.168.91.189","Port":80}]}
{"level":"info","ts":1665502796.6328108,"msg":"registered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502796.765318,"msg":"deRegistering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":"eu-west-1c","Id":"192.168.35.103","Port":80},{"AvailabilityZone":"eu-west-1c","Id":"192.168.61.232","Port":80}]}
{"level":"info","ts":1665502796.809031,"msg":"deRegistered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502796.8090775,"msg":"registering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":null,"Id":"192.168.16.64","Port":80}]}
{"level":"info","ts":1665502796.952015,"msg":"registered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502797.1036577,"msg":"deRegistering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":"eu-west-1b","Id":"192.168.83.233","Port":80}]}
{"level":"info","ts":1665502797.151773,"msg":"deRegistered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502797.151815,"msg":"registering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":null,"Id":"192.168.41.36","Port":80},{"AvailabilityZone":null,"Id":"192.168.64.124","Port":80}]}
{"level":"info","ts":1665502797.338572,"msg":"registered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
{"level":"info","ts":1665502797.4691834,"msg":"registering targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2","targets":[{"AvailabilityZone":null,"Id":"192.168.48.198","Port":80}]}
{"level":"info","ts":1665502797.6005466,"msg":"registered targets","arn":"arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2"}
同時,過濾 CloudTrail DeregisterTargets
同樣能觀察到使用 AWS Load Balancer Controller IRSA IAM role 更新 Target Group IP 地址資訊。
...
...
"userIdentity": {
"type": "AssumedRole",
"principalId": "AROAYFMQSNSERYVRZKOIQ:1665499421108782293",
"arn": "arn:aws:sts::111111111111:assumed-role/eksctl-ironman-2022-addon-iamserviceaccount-kube-Role1-MW90E4BAEQWQ/1665499421108782293",
...
...
...
"eventTime": "2022-10-11T15:39:51Z",
"eventSource": "elasticloadbalancing.amazonaws.com",
"eventName": "DeregisterTargets",
...
"userAgent": "elbv2.k8s.aws/v2.4.3 aws-sdk-go/1.42.27 (go1.17.10; linux; amd64)",
"requestParameters": {
"targetGroupArn": "arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2",
"targets": [
{
"id": "192.168.22.141",
"port": 80,
"availabilityZone": "eu-west-1a"
},
{
"id": "192.168.88.25",
"port": 80,
"availabilityZone": "eu-west-1b"
}
]
}
根據 2019 Kubecon talk [2] Ready? A Deep Dive into Pod Readiness Gates for Service Health... - Minhan Xia & Ping Zou [3]。提及若 Pod 為 Kubernetes Service Endpoint backend 流程:
kube-scheduler
調度指派至選定 Node 上。kubelet
接收 API server 更新(watch
verb),並透過 Container Runtime 如 Docker 建立 container。
Ready
回 kube-apiserver
watch
verb 接收更新資訊,當 Pod 進入至 Ready 狀態後,則會增加至 Service Object 上 Endpoints list,資訊包含 Pod IP/Port。kube-proxy
透過 watch
verb 接收 API server 更新對應新增 Service IP/Port 資訊至 iptable 路由規則。來源: Ready? A Deep Dive into Pod Readiness Gates for Service Health... - Minhan Xia & Ping Zou
在第一個 Pod 更新狀態至 Ready
之前,並不會部署第二個 Pod。值得注意的是,我們在更新 Deployment rolling upgrade 過程中,包含了上述步驟 4 集步驟 5 是同步進行。
換言之,在 Deployment Controller 更新下一個 Pod 時,新的 Pod 狀態操作仍在進行中。由於此過程仍會終止舊版本的 Pod,導致新 Pod 已經更新至 Ready
狀態但這些更新仍在進行中並且舊版本的 pod 已終止的情況。
在此次 EKS 環境上使用 AWS Load Balancer Controller 預設並不考慮在 ALB registration times 或是 health checks 時間。
根據上述流程,ALB 已經成功註冊 Pod IP 至 Target group 後,開始將請求轉發給 Pod,然而 Pod 內應用服務仍處於初始化階段,儘管 Pod 狀態更新為 Ready
狀態,但實際仍無法處理請求,而導致 HTTP 500-level error。
反之,當 Pod 被終止時,Pod 仍需等候 Service/Endpoint/Ingress Controller 更新並發送對應 AWS API 進行 registering/deregistering。預設 Kubernetes 也不會設定 ALB Target Group Deregistration delay [4] 屬性,這可能會導致 ALB 仍向已終止的 Pod 的 IP/Port 發送請求。